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Editor's forumn

It's hard to believe that the last QHJ came out last May.  What have I
been doing?  Well, let me tell you.  Since May I have had a number of
life changes that have kept me busy.

The first is a change in jobs.  I decided to leave the Air Force and
seek employment else where.  I spent a number of months looking through
technical career newspapers and various technical job related web
pages, looking for job openings.  I found the San Jose Mercury News
Talent Center to be about the best place to look, esp. for the SF Bay
Area.

Related to the first change, was leaving my job.  I had to finish a few
tasks and then document my job so I could pass it along to someone
else.  Documenting what you know is not as easy as it sounds.  I also
had to spend some time outprocessing from the service.  It takes
paperwork to get in the sevice, and it takes even more to get out.

The final and biggest change was moving from Dayton, OH, back to the SF
Bay Area.  Getting the house ready for moving and getting it ready to
sell took a while.  I had to do some painting, replace a few doors (one
cracked and one warped), patch some mortar on the brick outside of the
house, and a few other household chores.

This all left very little time for hobbies.  About the only time I used
the QL was writing cover letters and printing Resumes.  And since the
move my access to the Net, esp. USENET has been limited.

I am waiting for my house to sell in Ohio, so I moved into an
apartment.  This meant that I had to put a number of household goods in
storage.  The movers did not do a good job of putting the right stuff
in the right boxes so I could get what I needed off the moving van and
put the stuff I did not need in storage.  This meant that my QL is with
me, but the disk drives, power supply, mouse, and modem cable are in
storage.  I've had to borrow disk drives, a PC power supply, and a QL
power supply to get the QL up and running.  I still have to make a
modem cable.  I'm using my Z88 for my telecomm. needs, and it's tough
finding an Internet Service Provider that supports 8 lines of display
(real tough).  Once I get a modem cable built I should be able to read
comp.sys.sinclair.

Speaking of the Z88, most of this issue has been written on the Z88
while riding BART (the local commuter rail system) to work.  I have
about a 50 minute BART ride, so I have lots of time to put to good use.

And also speaking of work, I am now working for a company in Berkeley
called Project Technology.  They were founded by Sally Schelor and
Steve Melor, creators of the SM Object Oriented Analysis Method.  My
job is to maintain the Sun Unix boxes and the PC's.

While I've been busy doing non-QHJ things, I noticed that no one sent
me e-mail asking where the next QHJ issue was.  I'm not too sure if
this is a good sign or not.  Granted it was nice not to be bugged, but
then I have to wonder if the QHJ was missed.

One thing you will notice with this issue is the number of articles

with no code.  I have not had the time to sit and code at the QL, so
I've written some articles and covered what code was necessary with
pseudo-code.

Well, that's about enough for me.  Oh, since I have just moved, please
note the new snail mail address, but don't write it down in ink.  I
hope to buy a house sometime around the March or April '97 timeframe.
Here now the newsletter.


Exclusive OR Encryption

I've always been interested in encryption.  Keeping my files safe from
prying eyes has been more of a want than a need.  Plus encryption is a
neat programming problem to solve.  Many years ago I wrote a
probram called QL Crypt that was my first look at encryption.  In
QHJ #XX there was Complex Ascii Rotation (CAR) that was aimed at
encrypting mail messages just enough to make them secure from
casual observers.  There are many other ways to encrypt files, each
with it's own level of safety.

Encryption is based on two parts, the Method and the Key.  The Method
is what various computations are perfomed to get from the clear text to
the encrypted text.  This is equivalent to a lock.  The Key is the
chunk of data used to make one encryption different than an other.
Since the encryption Method does not change, it is the Key that makes
your text encrypted different from somebody elses.  This is the
equivilent to, well, a key.  A specific model of lock is manufactured
into a thousands of individual locks.  These locks all look and work
the same.  It is the key that makes each one secure and different from
the others.

There are many methods used in encryption, from the very easy to break,
to the damn near impossible.  The harder to break, the more computation
necessary to encrypt.  If you are worried about wasting computational
cycles, then you need only implement the Method that secures the
information to the level you need it.  Securing a Christmas gift list
is different than securing company trade secrets.

QL Crypt and CAR both used a character rotation Method for encryption.
As each character was read in, a value of 1-4 would be added to their
chracter value (CHR$), based on the Key, and then output to the
resultant file.  QL Crypt allowed the encryption of binary files, CAR
stayed with pure ASCII text so that it could be sent in e-mail.

Each one of these Methods, and many more, require the use of two
functions that are the opposite of each other.  In character rotation,
a value would be added to enrypt, and subtracted to decrypt.  What ever
gyrations you go through to encrypt you must reverse to decrypt.
Exclusive OR encryption does not have two opposite functions because
Exclusive OR is the oppostie of itself.

Exclusive OR (XOR)

      Bit 1  Bit 2     XOR

        0      0        0
        1      0        1

        0      1        1
        1      1        0

When using Exclusive OR with a bit pattern, what you XOR it with is
usually called the Mask.  To show you how XOR is the opposite of itself
let take a look at the binary pattern 010110 XORed with the mask
111111.

       Bit   Mask    XOR          Bit   Mask    XOR
        0     1       1            1     1       0
        1     1       0            0     1       1
        0     1       1            1     1       0
        1     1       0            0     1       1
        1     1       0            0     1       1
        0     1       1            1     1       0

Notice that after XORing the bit pattern with the mask and then XORing
the resultant bit pattern with the mask the original bit pattern
returns.  This means that writing the program to implement XOR
encryption does not require the writing of an encryption routine and a
decryption routine, only one is XOR routine is needed.

The Mask that is used in the XOR routine is derived from the Key.  How
secure you data is, is dependent on the Key and its length.  If you use
a Key of length one (1 byte) then it would take only 256 tries to break
the encryption.  The longer the Key, the more tries necessary to break
the encryption.

QL Crypt used the random number table in the QL as the key.  A password
was entered from the user, which then was used as the sed value for the
random number table.  This makes for very strong encryption (as the
random number table is fairly large and makes a long Key), but it make
it impossible to port to other platforms.  Even differences in QL ROMs
could cause the program to fail.

CAR used a ASCII password entered by the user.  This makes the program
very portable, but also makes it a weaker form of encryption.  If the
user typed in a fairly long password, then the level of secureness
would go up.

Constructing a Spell Checker

A spell checker is usually comprised of two parts: 1) word lookup (to
see if a word is spelled correctly) and 2) word suggestion (to suggest
the correct spelling of the word).  Past issues of the QHJ have looked
at different algorithms to tell how close two words are, a key part of
word suggestion.  This article will focus on word lookup.

The key thing to decide in creating the word lookup algorithm is the
data structure for storing the words and quickly looking them up. If
the word list was fairly short, a brute force method would work.  Since
most spell checkers will need a word list in the tens of thousands, the
lookup algorithm will need to be smarter.  We also need to keep in mind
that the words will be of many different lengths.

At first the most obvious data structure would be a tree structure.  A
word would walk down the tree structure letter by letter.  When it
reached the end of its length it would check the current tree node to

see if it is a valid word.  Let's take a look at three words, bar,
bard, and barr, and the following tree structure.

                      b
                    /   \
                    a    b
                  /   \
                 r     m
               /  \
              d    n

With the word BAR, the B is valid, with leads to an A which is valid
and it leads to an R which is valid.  The R node will have a value of 1
to signify that it is the end of a valid word.  This way the structure
can parse both BAR and BARN and distinguise between the two.  When
parsing BARR, the B is fine, which leads to A, which leads to R, but
now there  is no R path in the tree and the word is determined to be
invalid.

The problem with this data structure is two fold: one, you need to
construct it out of the dictionary file at run time, which can take
some time, or you need to find a way to store it so it can be read in
easily.  The second problems is that the language we are going to
contruct the spell checker in is SuperBasic, which does not easily
support making tree structures.  They are easily created with C
structures or Pascal records.

We could use a hashing algorithm since it is designed for very quick
look up, but with a very static list of words, our hashing algorithm
may require more dataspace than we really need.

We need to come up withh some data structure that is tailored to our
needs.  One that will provide a fairly quick look up and minimize on
the data space needed to store the word list.

Here is a suggestion:  Store the words in a flat array.  The words will
be pre-sorted on disk, first by the length of the word and then
alphabetically.  This means that all of the two letter words will be
grouped together and sorted alphabetically, then the three letter
words, etc.  Word length is one way to distinquish one word from an
other.

Create a two-dimensional array called start_array(x,y).  The X value
will be LENGTH and the Y value will be FIRST_CHAR.  As the words are
read in, the array will be used to keep track of where the first 2
letter starts in the array, where the first three letter word starts,
and so on.  It will also keep track of where words start by the first
letter.  When you need to do a lookup of the word BAR, LENGTH is 3,
FIRST_CHAR is equal to B, so you would look up start_array(3,'B').
this will return where the first 3 letter word that starts with B is
stored in the word array.  From there the search can be a simple brute
force search that compares all three letter B words to see if they
match BAR.

To determine where the search should end, you will also need to know
where the first three letter C word is at.  This can also be looked up
in the start_array.  Below is a little pseudo code showing how this
would work.


    start = start_array(3,'B')
    stop  = start_array(3,'C')

    FOR x = start TO stop
       IF word_array$(x) = BAR THEN EXIT success
    NEXT x
    EXIT fail


